Spectral Clustering with Perturbed Data

نویسندگان

  • Ling Huang
  • Donghui Yan
  • Michael I. Jordan
  • Nina Taft
چکیده

Spectral clustering is useful for a wide-ranging set of applications in areas such as biological data analysis, image processing and data mining. However, the computational and/or communication resources required by the method in processing large-scale data are often prohibitively high, and practitioners are often required to perturb the original data in various ways (quantization, downsampling, etc) before invoking a spectral algorithm. In this paper, we use stochastic perturbation theory to study the effects of data perturbation on the performance of spectral clustering. We show that the error under perturbation of spectral clustering is closely related to the perturbation of the eigenvectors of the Laplacian matrix. From this result we derive approximate upper bounds on the clustering error. We show that this bound is tight empirically across a wide range of problems, suggesting that it can be used in practical settings to determine the amount of data reduction allowed in order to meet a specification of permitted loss in clustering performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

[ m at h . SP ] 2 3 Ju l 2 00 7 PERTURBATION OF SPECTRA AND SPECTRAL SUBSPACES ∗

We consider the problem of variation of spectral subspaces for linear self-adjoint operators with emphasis on the case of off-diagonal perturbations. We prove a number of new optimal results on the shift of the spectrum and obtain (sharp) estimates on the norm of the difference of two spectral projections associated with isolated parts of the spectrum of the perturbed and unperturbed operators ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Noise Thresholds for Spectral Clustering

Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfec...

متن کامل

Detection of perturbed quantization (PQ) steganography based on empirical matrix

Perturbed Quantization (PQ) steganography scheme is almost undetectable with the current steganalysis methods. We present a new steganalysis method for detection of this data hiding algorithm. We show that the PQ method distorts the dependencies of DCT coefficient values; especially changes much lower than significant bit planes. For steganalysis of PQ, we propose features extraction from the e...

متن کامل

Robust path-based spectral clustering

Spectral clustering and path-based clustering are two recently developed clustering approaches that have delivered impressive results in a number of challenging clustering tasks. However, they are not robust enough against noise and outliers in the data. In this paper, based on M-estimation from robust statistics, we develop a robust path-based spectral clustering method by defining a robust pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008